Learning Preference Relations for Information Retrieval
نویسنده
چکیده
In this paper we investigate the problem of learning a preference relation i~om a given set of ranked documents. We show that the Bayes’s optimal decision function, when applied to learning a preference relation, may violate transitivity. This is undesirable for information retrieval, because it is in conflict with a document ranking based on the user’s preferences. To overcome this problem we present a vector space based method that performs a linear mapping from documeats to scalar utility values and thus guarantees transitivity. The learning of the relation between documeats is formulated as a classification problem on pairs of documents and is solved using the principle of structural risk minimization for good generalization. The approach is extended to polynomial utility functions by using the potential function method (the so called "kernel trick"), which allows to incorporate higher order correlations of features into the utility function at minimal computational costs. The resulting algorithm is tested on aa example with artificial data. The algorithm successfully learns the utility function underlying the training examples and shows good classification performance. Introduction The task of supervised learning in information retrieval (IR) is mostly based on the assumption that a given document is either relevant or non-relevant. This holds for example for Rocchio’s feedback algorithm (Saiton 1968) and for the binary independence model (Robertson 1977) which is based on a Bayesian approach. classification approach was adopted and as classifications were considered to be partitions on a set of objects this reduces to learning equivalence relations from examples. But there is also the view that the similarity of the documents to the query represents the importance of the documents (Salton 1989, p. 317), which in turn means that a user need implies some preference relation on the documents, hi (Bolhnann & Wong 1987) and (Wong, Yao, & BoUmann 1988) the idea was developed to learn a preference relation instead of an equivalence relation. The learning of preference relations reduces to a standard classification problem if pairs of objects are considered, because a binary relation can be viewed as a subset of the Cartesian product. (Wong, Yao, Bollmann 1988) successfully applied linear classification and perceptron learning to this problem. In this paper we consider the situation that there are more than two relevance levels and that there exist several documents with different relevance levels which all have the same description. We find that an ideal Bayesian approach leads to inconsistencies, namely to the violation of transitivity. To overcome this problem, an algorithm is developed which enforces transitivity by learning a linear mapping from document descriptions to scalar utility values based on training examples that consist of pairs of document descriptions and their preference relation. The learning procedure is based on the principle of structural risk minimization (SRM) (Vapnik 1995), which is known for its good generalization properties (for an application of SRM to document classification see (Joachims 1997)). The linear approach generalized to include nonlinear utility functions, which are able to capture correlations between the features, by applying the so-called "kernel-trick". The paper is structured as follows: First, the learning of preference relations is formulated as a classification problem on pairs of document descriptions and the inconsistency of the Bayesian approach is demonstrated. In the following, the linear vector space model is introduced and structural risk minimization is applied for learning the weight vector. Then, this approach is generalized to include nonlinear utility functions by applying the "kernel trick". Finally, we present some numerical experiments to demonstrate the validity of the approach. The Problem of Transitivity Let us consider a static document space denoted by D with documents d E D being represented by feature vectors d = (dl,d2,...,dn)’E :D where n denotes the number of the features dk. The user determines a preference relation on the documents used for training, and generates a training set S consisting of l pairs (d, d’) of document descriptions together with their relations d *> d~: From: AAAI Technical Report WS-98-05. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved. no. of documents relevance levels d d’ ] d"
منابع مشابه
Mining Preference Relations to Rank Complex Objects
One of the key tasks in data mining and information retrieval is to learn preference relations between objects. Approaches reported in the literature mainly aim at learning preference relations between objects represented according to the classical attribute-value representation. However, the growing interest in data mining techniques able to directly mine data represented according to more sop...
متن کاملRelational Gaussian Processes for Learning Preference Relations
Preference learning has received increasing attention in both machine learning and information retrieval. The goal of preference learning is to automatically learn a model to rank entities (e.g., documents, webpages, products, music, etc.) according to their degrees of relevance. The particularity of preference learning might be that the training data is a set of pairwise preferences between en...
متن کاملFast Active Exploration for Link-Based Preference Learning Using Gaussian Processes
In preference learning, the algorithm observes pairwise relative judgments (preference) between items as training data for learning an ordering of all items. This is an important learning problem for applications where absolute feedback is difficult to elicit, but pairwise judgments are readily available (e.g., via implicit feedback [13]). While it was already shown that active learning can eff...
متن کاملINCOMPLETE INTERVAL-VALUED HESITANT FUZZY PREFERENCE RELATIONS IN DECISION MAKING
In this article, we propose a method to deal with incomplete interval-valuedhesitant fuzzy preference relations. For this purpose, an additivetransitivity inspired technique for interval-valued hesitant fuzzypreference relations is formulated which assists in estimating missingpreferences. First of all, we introduce a condition for decision makersproviding incomplete information. Decision maker...
متن کاملFrom ranking to intransitive preference learning: rock-paper-scissors and beyond
In different fields like decision making, psychology, game theory and biology, it has been observed that paired-comparison data like preference relations defined by humans and animals can be intransitive. The relations may resemble the well-known game of rock-paper-scissors. In the game, rock defeats scissors and scissors defeat paper, but rock loses to paper. Intransitive relations cannot be m...
متن کامل